PROYECTO FINAL¶
i. Ordenamiento de data¶
i.i. Apertura de data¶
In [84]:
import pandas as pd
import altair as alt
alt.data_transformers.enable("vegafusion")
!pip install pyreadstat
!pip install wbgapi
!pip install "vegafusion[embed]>=1.5.0"
!pip install "vl-convert-python>=1.6.0"
Requirement already satisfied: pyreadstat in /usr/local/lib/python3.12/dist-packages (1.3.2) Requirement already satisfied: narwhals>=2.0 in /usr/local/lib/python3.12/dist-packages (from pyreadstat) (2.15.0) Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from pyreadstat) (2.0.2) Requirement already satisfied: wbgapi in /usr/local/lib/python3.12/dist-packages (1.0.12) Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from wbgapi) (2.32.4) Requirement already satisfied: PyYAML in /usr/local/lib/python3.12/dist-packages (from wbgapi) (6.0.3) Requirement already satisfied: tabulate in /usr/local/lib/python3.12/dist-packages (from wbgapi) (0.9.0) Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->wbgapi) (3.4.4) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->wbgapi) (3.11) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->wbgapi) (2.5.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->wbgapi) (2026.1.4) Requirement already satisfied: vegafusion>=1.5.0 in /usr/local/lib/python3.12/dist-packages (from vegafusion[embed]>=1.5.0) (2.0.3) Requirement already satisfied: arro3-core in /usr/local/lib/python3.12/dist-packages (from vegafusion>=1.5.0->vegafusion[embed]>=1.5.0) (0.6.5) Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from vegafusion>=1.5.0->vegafusion[embed]>=1.5.0) (25.0) Requirement already satisfied: narwhals>=1.42 in /usr/local/lib/python3.12/dist-packages (from vegafusion>=1.5.0->vegafusion[embed]>=1.5.0) (2.15.0) Requirement already satisfied: vl-convert-python>=1.6.0 in /usr/local/lib/python3.12/dist-packages (1.9.0)
In [85]:
modulo_1 = pd.read_csv('/content/Enaho01-2024-100.csv', encoding='latin1')
modulo_32 = pd.read_csv('/content/Sumaria-2024.csv', encoding='latin1')
modulo_32.head()
Out[85]:
| AÑO | MES | CONGLOME | VIVIENDA | HOGAR | UBIGEO | DOMINIO | ESTRATO | MIEPERHO | TOTMIEHO | ... | ESTRSOCIAL | LD | LINPE | LINEA | POBREZA | FACTOR07 | LINEAV | POBREZAV | NCONGLOME | SUB_CONGLOME | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2024 | 1 | 15009 | 13 | 11 | 10101 | 4 | 4 | 2 | 2 | ... | 4 | 0.815806 | 231.437622 | 382.024597 | 3 | 79.816757 | 672.335510 | 4 | 7098 | 0 |
| 1 | 2024 | 1 | 15009 | 47 | 11 | 10101 | 4 | 4 | 3 | 3 | ... | 4 | 0.815806 | 231.437622 | 382.024597 | 3 | 79.816757 | 684.988831 | 4 | 7098 | 0 |
| 2 | 2024 | 1 | 15009 | 59 | 11 | 10101 | 4 | 4 | 1 | 1 | ... | 4 | 0.815806 | 231.437622 | 382.024597 | 3 | 79.816757 | 705.972351 | 4 | 7098 | 0 |
| 3 | 2024 | 1 | 15009 | 71 | 11 | 10101 | 4 | 4 | 2 | 2 | ... | 4 | 0.815806 | 231.437622 | 382.024597 | 3 | 79.816757 | 703.466370 | 4 | 7098 | 0 |
| 4 | 2024 | 1 | 15009 | 84 | 11 | 10101 | 4 | 4 | 5 | 5 | ... | 4 | 0.815806 | 231.437622 | 382.024597 | 3 | 79.816757 | 686.349243 | 3 | 7098 | 0 |
5 rows × 163 columns
i.ii Limpieza de data¶
In [86]:
columnas_m1 = ['CONGLOME', 'VIVIENDA', 'HOGAR' , 'DOMINIO','NBI1', 'NBI2', 'NBI3',
'NBI4', 'NBI5']
modulo_1 = modulo_1[columnas_m1]
print(modulo_1.columns)
modulo_1.head()
Index(['CONGLOME', 'VIVIENDA', 'HOGAR', 'DOMINIO', 'NBI1', 'NBI2', 'NBI3',
'NBI4', 'NBI5'],
dtype='object')
Out[86]:
| CONGLOME | VIVIENDA | HOGAR | DOMINIO | NBI1 | NBI2 | NBI3 | NBI4 | NBI5 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 15006 | 13 | 11 | 4 | 0 | 0 | 0 | 0 | 0 |
| 1 | 15006 | 27 | 11 | 4 | 0 | 0 | 0 | 0 | 0 |
| 2 | 15006 | 50 | 11 | 4 | 0 | 0 | 0 | 0 | 0 |
| 3 | 15006 | 64 | 11 | 4 | 0 | 0 | 0 | 0 | 0 |
| 4 | 15006 | 76 | 11 | 4 | 0 | 0 | 0 | 0 | 0 |
In [87]:
columnas_m32 = ['CONGLOME', 'VIVIENDA', 'DOMINIO', 'HOGAR', 'MIEPERHO', 'PERCEPHO',
'POBREZA', 'GASHOG2D', 'INGHOG2D']
modulo_32 = modulo_32[columnas_m32]
print(modulo_32.columns)
modulo_32.head()
Index(['CONGLOME', 'VIVIENDA', 'DOMINIO', 'HOGAR', 'MIEPERHO', 'PERCEPHO',
'POBREZA', 'GASHOG2D', 'INGHOG2D'],
dtype='object')
Out[87]:
| CONGLOME | VIVIENDA | DOMINIO | HOGAR | MIEPERHO | PERCEPHO | POBREZA | GASHOG2D | INGHOG2D | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 15009 | 13 | 4 | 11 | 2 | 2 | 3 | 34188.218750 | 52162.609375 |
| 1 | 15009 | 47 | 4 | 11 | 3 | 2 | 3 | 40164.945312 | 40832.042969 |
| 2 | 15009 | 59 | 4 | 11 | 1 | 1 | 3 | 12308.838867 | 15098.497070 |
| 3 | 15009 | 71 | 4 | 11 | 2 | 2 | 3 | 30316.724609 | 41082.953125 |
| 4 | 15009 | 84 | 4 | 11 | 5 | 2 | 3 | 33076.910156 | 47659.160156 |
In [88]:
def categoria(i):
if i < 13000:
return 'Ingreso Bajo'
elif i < 35000:
return 'Ingreso Medio'
else:
return 'Ingreso Alto'
modulo_32['CATEGING'] = modulo_32['INGHOG2D'].apply(categoria)
print(modulo_32[['CATEGING']])
CATEGING 0 Ingreso Alto 1 Ingreso Alto 2 Ingreso Medio 3 Ingreso Alto 4 Ingreso Alto ... ... 33686 Ingreso Alto 33687 Ingreso Alto 33688 Ingreso Alto 33689 Ingreso Alto 33690 Ingreso Medio [33691 rows x 1 columns]
In [89]:
def asistencia(i):
if i == '1':
return 'No asiste'
else:
return 'Asiste'
modulo_1['CATEGNBI4'] = modulo_1['NBI4'].apply(asistencia)
print(modulo_1[['CATEGNBI4']])
CATEGNBI4 0 Asiste 1 Asiste 2 Asiste 3 Asiste 4 Asiste ... ... 44726 Asiste 44727 Asiste 44728 Asiste 44729 Asiste 44730 Asiste [44731 rows x 1 columns]
In [90]:
enaho = pd.merge(modulo_1, modulo_32,
on=['CONGLOME', 'VIVIENDA', 'HOGAR', 'DOMINIO'])
enaho.head()
Out[90]:
| CONGLOME | VIVIENDA | HOGAR | DOMINIO | NBI1 | NBI2 | NBI3 | NBI4 | NBI5 | CATEGNBI4 | MIEPERHO | PERCEPHO | POBREZA | GASHOG2D | INGHOG2D | CATEGING | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 15006 | 13 | 11 | 4 | 0 | 0 | 0 | 0 | 0 | Asiste | 2 | 2 | 3 | 12711.547852 | 12983.209961 | Ingreso Bajo |
| 1 | 15006 | 27 | 11 | 4 | 0 | 0 | 0 | 0 | 0 | Asiste | 3 | 3 | 2 | 8784.480469 | 8993.144531 | Ingreso Bajo |
| 2 | 15006 | 50 | 11 | 4 | 0 | 0 | 0 | 0 | 0 | Asiste | 4 | 4 | 3 | 44404.941406 | 127551.609375 | Ingreso Alto |
| 3 | 15006 | 64 | 11 | 4 | 0 | 0 | 0 | 0 | 0 | Asiste | 4 | 3 | 2 | 12542.700195 | 16807.876953 | Ingreso Medio |
| 4 | 15006 | 76 | 11 | 4 | 0 | 0 | 0 | 0 | 0 | Asiste | 2 | 1 | 3 | 17669.095703 | 17385.957031 | Ingreso Medio |
In [91]:
def region(d):
dominio = [1, 2, 3, 4, 5, 6, 7, 8]
regiones = ["Costa Norte", "Costa Centro", "Costa Sur",
"Sierra Norte", "Sierra Centro", "Sierra Sur",
"Selva", "Lima Metropolitana"]
if d in dominio:
nom_regiones = dominio.index(d)
return regiones[nom_regiones]
enaho['GEO'] = enaho['DOMINIO'].apply(region)
print(enaho[['GEO']])
GEO 0 Sierra Norte 1 Sierra Norte 2 Sierra Norte 3 Sierra Norte 4 Sierra Norte ... ... 33686 Selva 33687 Selva 33688 Selva 33689 Selva 33690 Selva [33691 rows x 1 columns]
In [92]:
enaho.columns
Out[92]:
Index(['CONGLOME', 'VIVIENDA', 'HOGAR', 'DOMINIO', 'NBI1', 'NBI2', 'NBI3',
'NBI4', 'NBI5', 'CATEGNBI4', 'MIEPERHO', 'PERCEPHO', 'POBREZA',
'GASHOG2D', 'INGHOG2D', 'CATEGING', 'GEO'],
dtype='object')
Gráficos¶
In [93]:
grafico1 = alt.Chart(enaho, width=300, height=300).mark_bar().encode(
x = alt.X("CATEGING:O",
title = "Categoría de Ingresos"),
y = alt.Y("count()",
title = "Número de Hogares"),
color = alt.Color("CATEGNBI4:N",
title = "Condición de Asistencia"),
xOffset = alt.XOffset('CATEGNBI4:N')
).interactive().properties(
title={
"text": "Relación entre el Ingreso anual y Asistencia Escolar",
"subtitle": "Gráfico de barras - Fuente: ENAHO 2024",
"color": "Black",
"subtitleColor": "Light red"
}
)
grafico1
Out[93]:
El grafico de barras apliadas evidencia que pese a que las diferencias del nivel de ingreso que puedan tener los hogares los niños asisten a la escuela. El número de hogares con niños que no asisten a la escuela es menor a 500. A pesar de que lo que intutivamente se podría pensar, los hogares con ingresos anuales bajos son los que menor inasistencia presenta. Posionando al acceso a la educación como un derecho exitoso en el territorio.
In [94]:
grafico2 = alt.Chart(enaho).mark_point(filled = True).encode(
x = alt.X("GEO:N", title='Ubicación geográfica'),
y = alt.Y("GASHOG2D:Q", title='Gasto anual del hogar'),
color = alt.Color("GEO:N", title='Ubicación geográfica',
legend=alt.Legend(orient='bottom', titleOrient='left')
),
column = alt.Column("GEO:N")
).properties(width=300, height=400).interactive().properties(
title={
"text": "Relación entre la Ubicación geográfica y el Gasto anual del hogar",
"subtitle": "Gráfico de multiples - Fuente: ENAHO 2024",
"color": "Black",
"subtitleColor": "Light red"
}
)
grafico2
Out[94]: